Discriminative Weighted Alignment Matrices For Statistical Machine Translation

نویسندگان

  • Nadi Tomeh
  • Alexandre Allauzen
  • François Yvon
چکیده

In extant phrase-based statistical machine translation (SMT) systems, the translation model relies on word-to-word alignments, which serve as constraints for the subsequent heuristic extraction and scoring processes. Word alignments are usually inferred in a probabilistic framework; yet, only one single best alignment is retained, as if alignments were deterministically produced. In this paper, we explore ways to take into account the entire alignment matrix, where each alignment link is scored by its probability. By comparison with previous attempts, we use an exponential model to compute these probabilities, which enables us to achieve significant improvements on the NIST MT’09 Arabic-English translation task.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Discriminative Phrase-based Lexicalized Reordering Models using Weighted Reordering Graphs

Lexicalized reordering models play a central role in phrase-based statistical machine translation systems. Starting from the distance-based reordering model, improvements have been made by considering adjacent words in word-based models, adjacent phrases pairs in phrasebased models, and finally, all phrases pairs in a sentence pair in the reordering graphs. However, reordering graphs treat all ...

متن کامل

Weighted Alignment Matrices for Statistical Machine Translation

Current statistical machine translation systems usually extract rules from bilingual corpora annotated with 1-best alignments. They are prone to learn noisy rules due to alignment mistakes. We propose a new structure called weighted alignment matrix to encode all possible alignments for a parallel text compactly. The key idea is to assign a probability to each word pair to indicate how well the...

متن کامل

Reordering Modeling using Weighted Alignment Matrices

In most statistical machine translation systems, the phrase/rule extraction algorithm uses alignments in the 1-best form, which might contain spurious alignment points. The usage of weighted alignment matrices that encode all possible alignments has been shown to generate better phrase tables for phrase-based systems. We propose two algorithms to generate the well known MSD reordering model usi...

متن کامل

Improved Discriminative Bilingual Word Alignment

For many years, statistical machine translation relied on generative models to provide bilingual word alignments. In 2005, several independent efforts showed that discriminative models could be used to enhance or replace the standard generative approach. Building on this work, we demonstrate substantial improvement in word-alignment accuracy, partly though improved training methods, but predomi...

متن کامل

Discriminative Alignment Training without Annotated Data for Machine Translation

In present Statistical Machine Translation (SMT) systems, alignment is trained in a previous stage as the translation model. Consequently, alignment model parameters are not tuned in function of the translation task, but only indirectly. In this paper, we propose a novel framework for discriminative training of alignment models with automated translation metrics as maximization criterion. In th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011